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Abstract. Choosing the regularization parameter for inverse problems is of major 
importance for the performance of the regularization method. 

We will introduce a fast version of the Lepskij balancing principle and show that it 
is a valid parameter choice method for Tikhonov regularization both in a deterministic 
and a stochastic noise regime as long as minor conditions on the solution are fulfilled. 
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1. Introduction 



In the following we will consider linear inverse problems [EHN961 IHof86j given as an 
operator equation 

Ax = y, (1) 

where A : X — > y is a linear, continuous, compact operator acting between separable 
real infinite dimensional Hilbert spaces X, y. Without loss of generality, we assume that 
A has a trivial null-space N(A) = {0}. Since A is compact and X is infinite dimensional, 
A does not have a continuous inverse, which makes (JTJ) ill-posed. 

For some definitions, but not for the methods themselves, we will need the singular 
value decomposition of A. There exist orthonormal bases {uk)k&m of X and {vk)kem of 
y and a sequence of decreasing singular values (o~k)k&m such that 

oo 

Ax = J2cr k (x,u k )v k . (2) 

k=l 

Moreover, we assume that the data y are noisy, the noise model for £ will be specified 
later. 

y s = Ax + £, £ noise. (3) 

In order to counter the ill-posedness, we need to regularize; in this article we will consider 
only Tikhonov regularization: 

x s n = A- 1 y s :=(A*A + q q n r 1 Ay s (4) 

The level n will now be called regularization parameter; go > and < q < 1 are 
constants which are discussed later. The noise-free regularized solution is defined as 



A~ 1 y = (A*A + qQ q n )- 1 Ay (5) 



The correct choice of the regularization parameter is of major importance for the 
performance of the method. In recent times, a number of articles [GP00, MP03t [RP05, 
IBH05L \MP06\ IHPR071 IBHM09] have considered the Lepskij Balancing principle |Lep90| 



for choosing this parameter in various situations. 

One of the major disadvantages of this parameter choice method in comparison to, 
for instance, the Morozov Discrepancy principle (cf. e.g., [EHN96J) is that one needs 
to compute all regularized solutions up to a maximal regularization parameter. On the 
other hand, this buys stability even in the face of stochastic noise. 

We will show that a simplification of the Lepskij balancing principle (now called fast 
balancing) will yield a valid parameter choice method which performs at least as well as 
the original. This idea has already been presented in a different form in [RH08] , however 
in a purely deterministic setting with a focus on convergence results. In contrast our 
goal is to provide feasible error bounds in realistic situations without requiring the noise 
level to be near to or converge to 0. There are methods (see e.g. |HPR08j ) which reach 
by a combination of solutions a better solution, however as the impact to practice has 
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been rather small still, we will keep our focus to choose the best solution in a set of 
given ones. 

The outline of this paper is as follows. First (section [2]) we will specify the conditions 
on the solution x and describe two different scenarios of noise, namely a stochastic and 
a deterministic one. In section [3] we will introduce the Lepskij balancing principle and 
the fast balancing principle. In the two following sections @] and we will show oracle 
inequalities for the new method. 



2. Prerequisites 

For Tikhonov regularization it always holds that \\x — x n +i|| — \\x — x n \\. We will need 
a slightly more powerful inequality at this point and therefore use a set of assumptions 
introduced in [BK08J. 

Assumption 2.1 Let x such that either (see \BK 08l eq. (9), slightly rewritten) 

\\(A*A)- l x\\ x < oo 

or that there exist constants 7 > 0, 1 > v > 0, C liV > and D 1)U > such that for all 
< t < 7 (see IBKOSf Definition 2.2) 

Dl v t 2u > E (x,u k ) 2 >Clt^ 

{k:\*l<t} 

Remark 2.2 There are other functional analytic formulations of this assumption (cf. 
[KN08]); however, the general idea is similar; namely that x should have a rather 
uniform distribution of the energy in its coefficients (x,Uk)- 

Lemma 2.3 (cf. [BK08]) Let x fulfill assumption l2~T[ Then it holds 

\\x — x n+ i\\ < wi\\x — x n \\ (6) 

with < Wi < 1. 

Now we will introduce two different noise models. Classically, one considers deterministic 
noise 

Definition 2.4 (Deterministic Noise) £ is called determistic noise of noise level S 
if 

U\\<* 

The noise behavior p(-) wrt the regularization parameter n is defined as 

p(n) = \\A- 1 \\5<5(q q n )- 1 / 2 



As all the results can be transferred easily to the case of colored noise by modifying the 
function p we will now just consider white noise. 
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Definition 2.5 (Stochastic Noise) Let £ a white noise Gaussian random variable, 
i.e., the (x,Uk) are independent and identically distributed (iid) along the distribution 
J\f(0,d~ 2 ). The noise behavior /?(•) wrt the regularization parameter n is defined as 

p{n) 2 = n\A- l t\\ 2 = 8 2 traced 1 ) 

In both cases this trivially yields 

Lemma 2.6 There exists 1 < W2 such that 

p(n) < p(n + 1) < w 2 p(n) (7) 

Remark 2.7 Almost every other constant in this article will be based on ui\ and W2- 
Though desirable we cannot give a general upper or lower bound for these constants, 
they are purely problem dependent. At first glance this might look as a disadvantage, 
however this means as well that we get problem specific optimal results. 

3. Fast Balancing 

The question of the optimality of a regularization parameter n opt is rather difficult in a 
deterministic setting. The most natural definition, namely 

n a = argmin \\x^ — x\\, 

n 

would be best. However, there is no known concept which leads to proofs in a general 
setting. Thus we will use the second-best solution. Using the triangle inequality, the 
error \\x s n — x\\ is bounded by the sum of a decreasing function \\x n — x\\ and an increasing 
function HA" 1 ^ which itself is bounded by p(n). 

\\% 5 n - x\\ < \\x n - x\\ + WA^^W < \\x n - x\\ + p{n) 

Again, the point 

n OQ = argmin \\x n — x\\ + p{n) 

n 

is inaccessible. However, using the definition (jSJ) below we can at least guarantee that 
2 (\\ x n 00 ~ x\\ + p{n QO )) > \\x nopt -x\\+ p{n opt ). 

Interestingly, the stochastic case is much easier due to the independence of £ and x: 
E ll4-z|| 2 = \\xn-x\\ 2 + p(n) 2 + 2 ((A; 1 )* (x n -x),C) = \\x n -x\\ 2 + p(n) 2 . 

In this case, the parameter n opt as defined below is really optimal on average: 

Definition 3.1 (Optimal Parameter) The optimal parameter n opt is defined such 
that 

\\ x n opt ~ A\ > p{n op t) and \\x napt+ x - x\\ < p{n opt + 1) (8) 
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Remark 3.2 This parameter, of course, just exists when the noise level 5 is sufficiently 
small. However, if this is not the case the noise described by p(-) dominates the 
information x so much that would be the best regularized solution. It is common 
practice to assume that this parameter n opt exists. 

Now we define the Fast and the Lepskij balancing principle and perform a first 
comparison 

Definition 3.3 (Balancing Functional) Let k > 1. The balancing functional is 
defined as 

b k (n)= max U-'llxi-xill pim)- 1 } (9) 

n<m<n+k *■ > 

Definition 3.4 (Lepskij Balancing Principle) Letb^in) defined as in Further- 
more let r > 1 and N > n*. The Lepskij Balancing Parameter ul(t,N) = ul is defined 
as 

Ul(t,N) = argmin {&£.(m) < r ViV > m > n} (10) 

n 

Definition 3.5 (Fast Balancing Principle) Let b k {n) defined as in and let r > 

0. The fast balancing parameter 
n* = 77* (t) is defined as 

(t) = argmin {b k {n) < t} (11) 

n 

Remark 3.6 In contrast to the Lepskij balancing principle jMP03[ \MP06[ \BM 07l. no 
upper bound N is needed. 

Lemma 3.7 It holds for all admissible pairs (N, r) 

n L (r,N) > min{iV,77*(T)} 

Proof 

This is a direct consequence out of ( fTUl) and ( |TTT) . □ 

Remark 3.8 For the sake of simpler notation we will mostly refer to ul{t,N) by ul 
and to n*(r) by n*. Just when the particular choice of N and t is importance we will 
keep these parameters. 

The condition that N > is of course rigorously seen not fulfillable without 
knowing the optimal regularization parameter, however this is a standard assumption for 
all proofs for the Lepskij balancing principle and does not pose any particular problems 
in practice where N is normally chosen anyway in the range of the machine precision. 

4. Deterministic Case 

Now we will show that - as for the Lepskij balancing principle - for the fast balancing 
principle an oracle inequality holds. 
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Theorem 4.1 Let and n opt defined as above. Furthermore assume assumption \2.1\ 
to be valid. It holds for the constants W\, W2 defined in (0|) and ([7J) independent of 6 

~ x \\< c (\\ x n opt - x\\ + p(n opt )) 

where C = ^^mini< re <& {iz^}- 
Proof 

Assume m > n > n opt . Due to assumption 12.11 and either definition of the noise the 
equations and fl7J) hold and we obtain with the triangle inequality 

\\x s n - x s m \\ < \\x n - x\\ + \\x m - x\\ + p(n) + p(m) < Ap(m) 

and hence 

b k (n) < 1 

and thus < n opt and hence p(n*) < p(n opt ) due to ([7]). Using we have 

\\x nt - x\\ > p(n*). 

On the other hand, for n < and k < k, using the inverse triangle inequality and 
resp. (J7|): 

4rp(n + re) >||x^-x^ +K || 

>||x n - x\\ - \\x n+K - x\\ - p{n) - p(n + re) 
>(1 - w")\\x n - x\\ - 2p(n + re). 



Hence 



\x n , - x\\ < (At + 2) min < 2 — \ p(n*) 

i< K <fc 1 l - W H 



and so using (J7j) 



< (At + 3) min { l — p(n*) 

l<K<k [ 1 - Wl J 

I w K I 

< (4r + 3) min p(n opt ) 



l<K<k [1—Wl 

(At + 3) [ wS ) /.. . ,n 

< ~^r— min — — (K op4 - x|| + p(n opt )) . □ 



Remark 4.2 Obviously, we obtain the best result for the minimal admissible r, i.e. 
t = 1 . Furthermore, C is independent of S as w± and Wi are. 
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5. Stochastic Case 

We will need a stochastic bound for the probability that the observed error is much 
bigger than our estimation p(-). 

Lemma 5.1 (see e.g., |BR08p Let Z = Y,T=i a lCl with J2T=i a l = 1 and Ck ~ 
N(0, 1) iid. Assume that max^ > 0. Then 



The proofs would become far too complicated if we used for any k. Therefore, we 
will restrict our attention to the case k — 1, i.e., In numerical implementations 

we observed that the results improve for slightly bigger k. From a certain k onwards 
(problem-dependent) it turns out that k does not seem to have an influence on the 
solution any more. 

Lemma 5.2 Assume assumption \2.1\ to be valid. For n < n op t there exist constants 
Ci, C2 > depending on w± and W2 but not depending on 5 and r such that 



Furthermore for n > n opt there exists a constant C3 > depending on w\ and u>2 but not 
depending on 5 and not depending on t such that 



Proof 

Let n < n opt . Then using Q, (@J, ©, ©, 0, (l8|). ffT21) and the triangle inequality we 



Vz > : F(Z > z) < V2e~ z/i . 



(12) 




(13) 




(14) 
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get (usage of the equations marked on top) 

Pj^lK-^IKn + ir^r} 
= p{4rp(n + l)> \\x 5 n - x 5 n+1 \\} 

< P[4rp(n + 1) > \\x n - x\\ - \\x n+1 - x|| - \\x 5 n - x n \\ - \\x 5 n+1 - x n+1 



§ p / 4r > (1 ~ wi)[|x n -x|| _ ||4 -x n || _ -x w+ i|| I 

_ \ p(n + l) p(n + l) p(n + l) J 

= p [ Ikn -Sn|| + ll^n+l -^n+lll > (1 ~ Wj) [| X n - X || _ ^) 

\ p(n + 1) p(n + 1) p(n + l) J 

f p / ]K^£nl , ||4 + i-^ + ilL (l-w 1 )w^ n ° pt - n) \\x nopt -x\ l ^ 
y p(n + 1) p(n + 1) p(n op t) 

00000 p r + IJA^l > _ _ 4 J 

{ Pin) p{n + l) J 
- \ p{n) 2 1 J \p(n + l) 2 1 

03 



f 2v /2 exp (- ) - 2r) 2 / 4 ) 

< 2^2 exp ^- exp(r 2 ) 

< ci exp (-c 2 {n opt - n) 2 ) exp(r 2 ) 



for some appropriate constants C\ and c 2 . Now let n > n opt . Then using ((3D, (J3J, (J5J), 
0; ©? (|T2l and the triangle inequality we get 

P{4- 1 ||4-< +1 ||p(n+l)- 1 >r} 
= P{4rp(n + l)<|| a 4-ai fl ||} 

< P|4rp(n + 1) < \\x n - x|| + \\x n+1 + x|| + \\x 5 n - x n \\ + \\x s n+1 - x n+ i\\j 

mm f , 

< P{(4r-2)p(n + l) < \\x s n - x n \\ + \\x s n+1 - x n+1 1| } 

mm P m p | 2 r < M^a + n^ien | 

<P ( r< ™ +P { r <^m 

" \ " p{n) j \ -p(n + l)J 

2 v / 2exp(-r 2 /4) . □ 



< 



Lemma 5.3 Assume assumption \2.1\ to be valid and constants C\,C2 as defined in lemma 
15. M Let k = 1. Then for n < n opt 

P{n* = n} < ciexp (-c 2 (n opi - n) 2 ) exp(r 2 ). (15) 
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P{n* = n} = P{&x(n) < r and V m<n 6i(n) > r} 

< ¥{bx(n) < t} < c x exp [~c 2 {n opt - n) 2 ) exp(r 2 ). □ 

The situation for n > n opt becomes much more complicated. It all depends on 
the question how fast the probabilities P{&i(n) > r} and P{6i(m) > r} decorrelate. 
Therefore, we will first present both the extreme cases and then discuss their 
implications. 

Lemma 5.4 Assume assumption \2.1\ to be valid and c 3 defined as in lemma \5."A Let 
n > n opt . Then 

P {n* G {n opt + 1 . . . n}} < c 3 exp (-r 2 /4) . (16) 

Assume additionally that for all in < n it holds P{&i(n) > r and bi(m) > r} = 
P{6iH > r}P{6i(m) > r}. Then 

F{n* = n}< (c 3 exp (-r 2 /^)"""^ . (17) 

Proof 

Direct consequence of lemma 15.21 □ 

In the case of decorrelation, the likelihood of the event > n opt decreases very fast, 
whereas in the worst case (i.e. the perfectly correlated case) the likelihood is constant. 
Please note that in any case we are in a considerably better situation than with the 
Lepskij balancing principle, where, in dependence of an upper bound N, the probability 
is iVexp(-r 2 /4). 

In reality we cannot expect complete decorrelation of events, however considering 
practical observations the following relaxed condition seems to be reasonable: 

Assumption 5.5 There exists a constant Cs(wi,W2) > C3 such that for n > 

F{n* = n}< (C 3 exp (- r 2 /4))"""° pt . (18) 

Assume furthermore that r big enough such that 

C 3 exp(-T74)u> 2 < 1 (19) 

Remark 5.6 Obviously we have no justified way to find out which t fulfills ( TJPj) .- 
however as C 3 just depends on w\ and w<i and not on S we also have this independence 
from 5 for t. 

In most practical situations t = 1 seems to be sufficient and a dependence of 5 has 
not been observed (which might of course just be due to the limited range of numbers 
processable on modern computer hardware). 
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Theorem 5.7 Assume that assumptions \2. II and \5.5\ are valid. Then we have 

E||<-x|| 2 < c 5 (l + c 6 exp(r 2 /2) + c 5 - ^ ) (\\x nopt - x\\ 2 + p(n opt ) 2 ) 

\ 1 - C3 exp (-T 2 /8) w 2 / v 7 

w'i/i variables C5 and C6 jnsi depending on w± and w-i- 
Proof 

Due to the independence of x and £ and an inequality connecting the different moments 
of Gaussian random variables (see e.g., }BR08j ). we have for C5 = (4r(3)) 1//4 

IE 1 1 tC ^ 1 1 1 1 oc Yi 1 1 I 2 J I iC yi 1 1 Jii 1 1 37 ^ fx 1 1 ^ J-ti 1 1 »2/ iyi ti ! I 

< ||x - x n \\ 4 + 2||x - x„|| 2 E||x n - x 6 J 2 + (c 5 E||a; n - 4|| 2 ) 2 

< (\\x - x n \\ 2 + c 5 E||x n - x£|| 2 ) 

< cj (\\X - X n \\ 2 +E\\x n - xi U ' 2 ' " 



"5 \ II J ' ^nW \ u - , \\ Jj n • lj n\ 
2 AnMI™ ™(5 ||2 N 2 



(e 



c 5 l JtLiia; — x r 



For an appropriate constant cq independent of r it holds using (jSj), (j7J), (jSJ), (TT51) . ffTSl) 
and the Holder inequality: 



E\\x s nt - x|| 2 = E E (lk - 4l| 2l r 

n=l 
Uopt— 1 



V2 / „, \l/2 



< E (e|I*-4IIT (eil, 

n=l 

+ E||< pt -a;|| 2 



+ £ (E|k-4H 4 ) 1/2 (E1U' 32 

n=n op t+l 



< E c 5 (Hz _ x «|| 2 + P( ra ) 2 ) ( c i ex P {-C2(n opt - n) 2 /2j exp(r 2 /2)J 



n=l 

|2 



+ Ik -Xr^W + p{n op t) 

00 

+ E c 5 (\\x - x n \\ 2 + p(n) 2 ) (C3exp(-r 2 /8 

n=ra op t+l 



n—riopt 



< E c 5 (^i pt ||x-x nopt || 2 + p(n p t ) 2 J (ciexp (-c 2 (n opi - n) 2 /2J exp(r 2 /2)J 

n=l 

+ Ik -Zn opt || 2 + p(n op t) 2 

+ E c 5 (lk - ^nopJI 2 + wl' n ° pt p{n opt ) 2 ') (C 3 exp (-r 2 /8))' 

n=n op t+l 

<\\ x ~ x n ov t\\ 2 c 5 ( 1 + c 6 exp(r 2 /2) ' ^ 



l-C 3 exp(-rV8); 

+ p(v) 2c 5 ( 1 + c 6 exp(r 2 /2) + c 5 - 7 — 

\ 1 - C 3 exp (-t 2 /8)w 2/ 
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which proves the assertion. □ 

Even if we cannot assume decorrelation (assumption 15. 5)) . we still have a convergence 
result due to til(t) > minjiV, n*(r)}, also compare [RH08j. 

Theorem 5.8 Assume that assumptions \2.1\ to be valid. Let = min { iV, n*(r)} and 
T — clog(8~l). Then it holds 

n< L -A\ 2 < c L (\\x nopt -x\\ 2 + iog(r 1 ) P K Pi ) 2 ) 

and 

E\\x 5 nr - x\\ 2 < C, (\\x nopt - x\\ 2 + \og{5^) p{n opt f) . 

Proof 

The first part was proven in [BP05J; the second part is trivial, using that on the one 
hand n^ij) > minjiV, n^(r)}. On the other hand, the risk for n» being too small (which 
is not affected by the decorrelation effect) can be bounded from above by a multiple of 
ll x n op i — x \\ 2 + p{ n opt) 2 as shown in the last theorem. □ 

6. Conclusion 

In numerical experiments [BLIOj it is almost impossible to distinguish the results of 
the Lepskij balancing principle and the newly introduced fast balancing. This can be 
interpreted as follows: 

• There is just a very low probability for outliers in the noise which influence the 
noise behavior p(-) after the optimal regularization parameter. 

• There is an extremely low probability that the noise modifies the data in such a 
way that one stops too early. 

However, considering the computation time, the new method has big advantages, it can 
compete easily with other methods like the Morozov discrepancy principle. 

In conclusion, we have shown that this modification of the Lepskij balancing 
principle is very well suited for practice and should replace it in all time-critical 
applications as long as one does not need to fear big frequency gaps in the solutions. 

This analysis has been done only for Tikhonov regularization. A similar analysis 
imposing stricter requirements on the solution x was performed for truncated Singular 
Value Decomposition (TSVD) in [BaulO]. 

Furthermore, large numerical experiments [BLIP] show that the newly defined 
method works very well and, in contrast to most other parameter choice regimes, can 
cope with colored noise without any performance loss. In these experiments it was 
observed that the factor C in the oracle inequality is at most around 2. The method 
is very stable, i.e., the number of observed outliers is very low, both for Tikhonov and 
Spectral-Cut-Off regularization. This behavior does not change when one replaces the 
exact noise behavior by an estimation based on several measurements [BaulO]. 
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